Goto

Collaborating Authors

 case count


Predicting COVID-19 Prevalence Using Wastewater RNA Surveillance: A Semi-Supervised Learning Approach with Temporal Feature Trust

Chen, Yifei, Liang, Eric

arXiv.org Artificial Intelligence

As COVID-19 transitions into an endemic disease that remains constantly present in the population at a stable level, monitoring its prevalence without invasive measures becomes increasingly important. In this paper, we present a deep neural network estimator for the COVID-19 daily case count based on wastewater surveillance data and other confounding factors. This work builds upon the study by Jiang, Kolozsvary, and Li (2024), which connects the COVID-19 case counts with testing data collected early in the pandemic. Using the COVID-19 testing data and the wastewater surveillance data during the period when both data were highly reliable, one can train an artificial neural network that learns the nonlinear relation between the COVID-19 daily case count and the wastewater viral RNA concentration. From a machine learning perspective, the main challenge lies in addressing temporal feature reliability, as the training data has different reliability over different time periods.






Investigating the effectiveness of multimodal data in forecasting SARS-COV-2 case surges

Raghuvamsi, Palur Venkata, Loh, Siyuan Brandon, Bhattacharya, Prasanta, Ho, Joses, Chuen, Raphael Lee Tze, Han, Alvin X., Maurer-Stroh, Sebastian

arXiv.org Machine Learning

The COVID-19 pandemic response relied heavily on statistical and machine learning models to predict key outcomes such as case prevalence and fatality rates. These predictions were instrumental in enabling timely public health interventions that helped break transmission cycles. While most existing models are grounded in traditional epidemiological data, the potential of alternative datasets, such as those derived from genomic information and human behavior, remains underexplored. In the current study, we investigated the usefulness of diverse modalities of feature sets in predicting case surges. Our results highlight the relative effectiveness of biological (e.g., mutations), public health (e.g., case counts, policy interventions) and human behavioral features (e.g., mobility and social media conversations) in predicting country-level case surges. Importantly, we uncover considerable heterogeneity in predictive performance across countries and feature modalities, suggesting that surge prediction models may need to be tailored to specific national contexts and pandemic phases. Overall, our work highlights the value of integrating alternative data sources into existing disease surveillance frameworks to enhance the prediction of pandemic dynamics.


Predictors of disease outbreaks at continentalscale in the African region: Insights and predictions with geospatial artificial intelligence using earth observations and routine disease surveillance data

Pezanowski, Scott, Koua, Etien Luc, Okeibunor, Joseph C, Gueye, Abdou Salam

arXiv.org Artificial Intelligence

Objectives: Our research adopts computational techniques to analyze disease outbreaks weekly over a large geographic area while maintaining local-level analysis by incorporating relevant high-spatial resolution cultural and environmental datasets. The abundance of data about disease outbreaks gives scientists an excellent opportunity to uncover patterns in disease spread and make future predictions. However, data over a sizeable geographic area quickly outpace human cognition. Our study area covers a significant portion of the African continent (about 17,885,000 km2). The data size makes computational analysis vital to assist human decision-makers. Methods: We first applied global and local spatial autocorrelation for malaria, cholera, meningitis, and yellow fever case counts. We then used machine learning to predict the weekly presence of these diseases in the second-level administrative district. Lastly, we used machine learning feature importance methods on the variables that affect spread. Results: Our spatial autocorrelation results show that geographic nearness is critical but varies in effect and space. Moreover, we identified many interesting hot and cold spots and spatial outliers. The machine learning model infers a binary class of cases or none with the best F1 score of 0.96 for malaria. Machine learning feature importance uncovered critical cultural and environmental factors affecting outbreaks and variations between diseases. Conclusions: Our study shows that data analytics and machine learning are vital to understanding and monitoring disease outbreaks locally across vast areas. The speed at which these methods produce insights can be critical during epidemics and emergencies.


Machine Learning Models for Dengue Forecasting in Singapore

Lai, Zi Iun, Fung, Wai Kit, Chew, Enquan

arXiv.org Artificial Intelligence

With emerging prevalence beyond traditionally endemic regions, the global burden of dengue disease is forecasted to be one of the fastest growing. With limited direct treatment or vaccination currently available, prevention through vector control is widely believed to be the most effective form of managing outbreaks. This study examines traditional state space models (moving average, autoregressive, ARIMA, SARIMA), supervised learning techniques (XGBoost, SVM, KNN) and deep networks (LSTM, CNN, ConvLSTM) for forecasting weekly dengue cases in Singapore. Meteorological data and search engine trends were included as features for ML techniques. Forecasts using CNNs yielded lowest RMSE in weekly cases in 2019.


Analyzing the Variations in Emergency Department Boarding and Testing the Transferability of Forecasting Models across COVID-19 Pandemic Waves in Hong Kong: Hybrid CNN-LSTM approach to quantifying building-level socioecological risk

Leung, Eman, Guan, Jingjing, Kwok, Kin On, Hung, CT, Ching, CC., Chung, CK., Tsang, Hector, Yeoh, EK, Lee, Albert

arXiv.org Artificial Intelligence

Emergency department's (ED) boarding (defined as ED waiting time greater than four hours) has been linked to poor patient outcomes and health system performance. Yet, effective forecasting models is rare before COVID-19, lacking during the peri-COVID era. Here, a hybrid convolutional neural network (CNN)-Long short-term memory (LSTM) model was applied to public-domain data sourced from Hong Kong's Hospital Authority, Department of Health, and Housing Authority. In addition, we sought to identify the phase of the COVID-19 pandemic that most significantly perturbed our complex adaptive healthcare system, thereby revealing a stable pattern of interconnectedness among its components, using deep transfer learning methodology. Our result shows that 1) the greatest proportion of days with ED boarding was found between waves four and five; 2) the best-performing model for forecasting ED boarding was observed between waves four and five, which was based on features representing time-invariant residential buildings' built environment and sociodemographic profiles and the historical time series of ED boarding and case counts, compared to during the waves when best-performing forecasting is based on time-series features alone; and 3) when the model built from the period between waves four and five was applied to data from other waves via deep transfer learning, the transferred model enhanced the performance of indigenous models.


Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data

Calderon, Pio, Soen, Alexander, Rizoiu, Marian-Andrei

arXiv.org Artificial Intelligence

The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.